-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Replace vignettes and examples #11123
Conversation
Thanks for the I know nothing about I can at least tell you how we're addressing these things LightGBM, maybe it will be helpful. In
output:
markdown::html_format:
options:
toc: true
number_sections: true
vignette: >
%\VignetteIndexEntry{Basic Walkthrough}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}
It's a little awkward that the R docs are not technically part of the readthedocs site, but overall this has worked pretty well for us. Some notes:
Even if you don't pursue this specific mix, I do recommend not checking the |
Thanks for the suggestions, but while LightGBM's solution works in the sense of providing rendered artifacts, I don't think it's an ideal solution here - the doc page URL differs from the Sphinx one and is not as easily indexable/searchable as an embedded .md or .ipynb file in the same sphinx page. I've added a script and instructions to update the Still waiting from comments from @mayer79 if there are any. |
Let me update the branch to prevent CI hang on mgpu tests. It's a AWS instance. |
Sure, those are good points. You have to balance whether those benefits you listed are worth the added build complexity, heavier set of dependencies, and increased risk of CRAN rejections. Not my call to make here in |
I will take a look at the doc build tomorrow. |
Hi, @david-cortes. May I ask what qdm provides that rmd doesn't in terms of generating |
For now, I think We can continue to use knitr for vignettes unless qmd provides extra capabilities that rmd doesn't have. For information about sphinx domain, please see https://www.sphinx-doc.org/en/master/usage/domains/index.html . |
As far as I'm aware, Issue is: it appears to use jupyter to generate formats like |
@trivialfis Example of how the result would look like: Would be ideal to find a way to keep it looking like this. |
Thank you for sharing! We have a script to create markdown files, see changes in a #11166, specially around the makefile. Although I'm not sure how knitr works. I think the amount of time we need to spend to make a satisfactory result by various conversions will be about the same as writing a proper sphinx extension. As a result, I suggest we use pkgdown for now. If you feel the sphinx integration is important for the R community, planning a future sphinx extension is better use of time than hacking various markdown flavors. |
Please note that the knitr script doesn't create R api reference, while the pkgdown does. |
Actually on a second look: turns out that knitr does support building to .md (but not to .ipynb unfortunately, so we couldn't add plots or similar), without IR. So no need to use .qmd. Now the next question: could this be arranged in such a way that the .md file is built programmatically by the CI and commited to the repository? |
If you believe the vegnittes should be rendered as sphinx doc, we can continue to use knitr and leave the API references to pkgdown |
Yes, that'd be ideal. But how would that be achieved? |
Good question, I am sure I can enforce new commits to update the MD files, but I don't know if I can build the MD files automatically. In the referenced pr, the doc builder hangs. |
That might not be a good idea - the .md would change for example whenever the random numbers produced differ, or when there are numerical differences in results between different CI machines, and so on. |
Opened an issue for RTD readthedocs/readthedocs.org#11928 I will try to reproduce using their container image tomorrow. |
@trivialfis In the CI, the Makefile under I don't see any trace of it in the RTD log. |
It's manual, I have never looked into it before. In my pr, I added some changes to the Hopefully there's a better way to do it instead of burdening the RTD server |
This reverts commit f09b656.
Ok, then let's leave the sphinx rendering for a different PR. I've removed all the references to .md files from this one, leaving only the part about adding a new vignette and examples. |
ref #9810
closes #10746
This PR adds a new introductory vignette which replaces most of the previous ones, and modifies the code examples throughout functions aimed at interactive usage to call
xgboost()
instead ofxgb.train()
.Motivation
Since the time that XGBoost was first published at CRAN, its adoption and mindshare have risen substantially, to the point that it has become the standard when it comes to boosted decision trees. In this day and age, I don't think the package needs to provide any introduction to the concepts of gradient-boosting, cross-validation, evaluation metrics, and so on - people who use R are already going to be familiar with those, and the things it compares against (like the package 'gbm') have become obsolete by now.
As well, the documentation and tutorials for XGBoost have mostly moved to the online docs - any R-specific documents become outdated rather soon, and are less likely to be seen by a random user. Most of the python examples and guides should in any event work with the R interface with very minimal modifications like dict->list.
Apart from becoming a standard-use library, the features supported by XGBoost have expanded over time, and lots of the materials that were there before, such as the first vignette, contained tips that are not applicable to the current state of the library, like manually one-hot encoding categorical features.
Hence, I decided to remove the previous vignettes and create a new one from scratch, which contains only examples around the usage of the R interface and its conventions.
Help needed
It would be ideal if this vignette could also get added to the online docs.
Thus, I created the vignette as a quarto file (.qmd), which has the option to render to both .html (what CRAN hosts) and .md (which can be included in the .rst files).
Only, getting it to render to .md required building the vignette with jupyter instead of knitr, which in turn requires installs of python, jupyter, ipykernel, and the "ir" kernel that runs R in jupyter, plus registering that kernel in the user-level config for jupyter. By adding that line "jupyter: ir", it additionally makes the default quarto render (e.g. as used by the "knit" button in RStudio) build the .html vignette using jupyter instead of knitr, which is most definitely not going to work out in CRAN servers. I don't know how to solve this.
Would also be nice if some CI job could be auto-building the .md file for the online docs from the .qmd source of the vignette.
(CCing @mayer79 and @jameslamb )